Skip to content

feat(llm-core): SSE streaming parser with UTF-8 tail handling#469

Open
quangdang46 wants to merge 3 commits into
masterfrom
feat/E2-sse-parser
Open

feat(llm-core): SSE streaming parser with UTF-8 tail handling#469
quangdang46 wants to merge 3 commits into
masterfrom
feat/E2-sse-parser

Conversation

@quangdang46

Copy link
Copy Markdown
Owner

Summary

Implements E2 — a reusable SSE streaming parser in jcode-llm-core to replace per-provider ad-hoc SSE parsing.

What's Included

crates/jcode-llm-core/src/sse.rs (680 lines)

  • SseEvent — parsed event with event, data, id, retry fields. Event-type interning (Cow::Borrowed) for common Anthropic/OpenAI events eliminates per-event String allocation.
  • SseParser — streaming state machine. Supports:
    • CR, LF, CRLF line endings
    • UTF-8 BOM stripping (SSE spec)
    • Configurable per-event data cap (100 MB default, 10 MB buffer cap)
    • Zero-copy fast path when buffer is empty
    • feed() returns complete events; flush() handles stream-end
    • 15 unit tests covering all parse states
  • SseStream<S>futures::Stream wrapper converting Result<Vec<u8>, _>Result<SseEvent, _>. Handles UTF-8 multi-byte characters split across chunk boundaries.

docs/pr-plans/E2-sse-parser.md (46 lines)

Plan file with research summary, gap analysis, and implementation decisions.

Why a Shared Module?

Previously, SSE parsing was embedded per-provider (Anthropic has inline parsing; OpenRouter too). This module:

  • Eliminates duplicated parse logic
  • Adds UTF-8 tail handling (partial multi-byte chars at chunk boundaries)
  • Adds proper EventSource-compatible SseStream for ergonomic provider code
  • Follows pi-agent-rust reference implementation (MIT license)

Verification

  • cargo check -p jcode-llm-core — clean (0 new errors)
  • All 15 SseParser tests pass
  • SseStream verified by cargo check (stream boilerplate; parser tests cover all parsing paths)

Refs: #E2, docs/pr-plans/E2-sse-parser.md

Add consolidated PR backlog from 13 reference repos (A-J, ~80 features)
and supporting docs (MASTER_GOAL_PROMPT, GOAL_DRIVEN_PROMPT, CONSOLIDATED_FINDINGS).
Implements a reusable SSE parser module in jcode-llm-core:
- SseEvent with event-type interning
- SseParser state machine (BOM, CR/LF/CRLF, data cap, chunked parsing)
- SseStream futures::Stream wrapper with UTF-8 tail accumulation
- 15 unit tests covering all parsing paths

Refs: docs/pr-plans/E2-sse-parser.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant